planning and learning
Planning and Learning: Path-Planning for Autonomous Vehicles, a Review of the Literature
Osanlou, Kevin, Guettier, Christophe, Cazenave, Tristan, Jacopin, Eric
This short review aims to make the reader familiar with state-of-the-art works relating to planning, scheduling and learning. First, we study state-of-the-art planning algorithms. We give a brief introduction of neural networks. Then we explore in more detail graph neural networks, a recent variant of neural networks suited for processing graph-structured inputs. We describe briefly the concept of reinforcement learning algorithms and some approaches designed to date. Next, we study some successful approaches combining neural networks for path-planning. Lastly, we focus on temporal planning problems with uncertainty.
What Robots Need to Succeed: Machine-Learning to Teach Effectively - Robotics Business Review
The Mid-twentieth century sociologist David Reisman was perhaps the first to wonder with unease what people would do with all of their free time once the encroaching machine automation of the 1960s liberated humans from their menial chores and decision-making. His prosperous, if anxious, vision of the future only half came to pass however, as the complexities of life expanded to continually fill the days of both man and machine. Work alleviated by industrious machines, such as robotics systems, in the ensuing decades only freed humans to create increasingly elaborate new tasks to be labored over. Rather than give us more free time, the machines gave us more time to work. Machine Learning Today, the primary man-made assistants helping humans with their work are decreasingly likely to take the form of an assembly line of robot limbs or the robotic butlers first dreamed up during the era of the Space Race.
A novel approach to model exploration for value function learning
Ajanovic, Zlatan, Beglerovic, Halil, Lacevic, Bakir
Planning and Learning are complementary approaches. Planning relies on deliberative reasoning about the current state and sequence of future reachable states to solve the problem. Learning, on the other hand, is focused on improving system performance based on experience or available data. Learning to improve the performance of planning based on experience in similar, previously solved problems, is ongoing research. One approach is to learn Value function (cost-to-go) which can be used as heuristics for speeding up search-based planning. Existing approaches in this direction use the results of the previous search for learning the heuristics. In this work, we present a search-inspired approach of systematic model exploration for the learning of the value function which does not stop when a plan is available but rather prolongs search such that not only resulting optimal path is used but also extended region around the optimal path. This, in turn, improves both the efficiency and robustness of successive planning. Additionally, the effect of losing admissibility by using ML heuristic is managed by bounding ML with other admissible heuristics.
Planning and Learning for Decentralized MDPs with Event Driven Rewards
Gupta, Tarun (International Institute of Information Technology, Hyderabad) | Kumar, Akshat (Singapore Management University) | Paruchuri, Praveen (International Institute of Information Technology, Hyderabad)
Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such history-based rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute — 1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential state- spaces.
The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems
Oliehoek, Frans A. (University of Liverpool, University of Amsterdam) | Spaan, Matthijs T. J. (Delft University of Technology) | Robbel, Philipp (Massachusetts Institute of Technology) | Messias, Joao (University of Amsterdam)
This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Some of its key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent systems; provides a large number of models for decision-theoretic decision making, including one-shot decision making (e.g., Bayesian games) and sequential decision making under various assumptions of observability and cooperation, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an extensive range of planning and learning algorithms for single-and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm.